Towards a formal framework for linguistic annotations

نویسندگان

  • Steven Bird
  • Mark Liberman
چکیده

‘Linguistic annotation’ is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of annotation formats and demonstrate a common conceptual core. This provides the foundation for an algebraic framework which encompasses the representation, archiving and query of linguistic annotations, while remaining consistent with many alternative file formats.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A formal framework for linguistic annotation

Linguistic annotation" covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions – audio, video and/or physiological recordings – or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, "named entit...

متن کامل

Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web

This paper explores how and why the Linguistic Annotation Framework might be adapted for compatibility with recent more general proposals for the representation of annotations in the Semantic Web, referred to here as the Open Annotation models. We argue that the adapted model, in addition to being interoperable with other annotations and annotation tools, also resolves some representational lim...

متن کامل

Ontology-Based Interface Specifications for a NLP Pipeline Architecture

The high level of heterogeneity between linguistic annotations usually complicates the interoperability of processing modules within an NLP pipeline. In this paper, a framework for the interoperation of NLP components, based on a data-driven architecture, is presented. Here, ontologies of linguistic annotation are employed to provide a conceptual basis for the tag-set neutral processing of ling...

متن کامل

Towards International Standards for Language Resources

This paper describes the Linguistic Annotation Framework (LAF) developed by the International Standards Organization TC32 SC4, which is to serve as a basis for harmonizing existing language resources as well as developing new ones. We then describe the use of the LAF to represent the American National Corpus and its linguistic annotations.

متن کامل

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998